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Summary 

In this work we provide a simple estimation procedure for a general frailty model for 
analysis of prospective correlated failure times. Rigorous large-sample theory for the pro- 
posed estimators of both the regression coefficient vector and the dependence parameter 
is given, including consistent variance estimators. In a simulation study under the widely 
used gamma frailty model, our proposed approach was found to have essentially the same 
efficiency as the EM-based estimator considered by other authors, with negligible differ- 
ence between the standard errors of the two estimators. The proposed approach, however, 
provides a framework capable of handling general frailty distributions with finite moments 
and yields an explicit consistent variance estimator. 

Key words: Correlated failure times; EM algorithm; Frailty model; Prospective family 
study; Survival analysis. 
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1 Introduction 



Many epidemiological studies involves failure times that are clustered into groups, such as 
families or schools, where some unobserved characteristics shared by the members of the 
same cluster (e.g. genetic information or unmeasured shared environmental exposures) 
could influence time to the studied event. In frailty models within cluster dependence is 
represented through a shared unobservable variable as a random effect. Estimation in the 
frailty model has received much attention under various frailty distributions, including 
gamma (Gill, 1985, 1989; Nielsen et al., 1992; Klein 1992, among others), positive stable 
(Hougaard, 1986; Fine et al., 2003), inverse Gaussian, compound Poisson (Henderson 
and Oman, 1999) and log-normal (McGilchrist, 1993; Ripatti and Palmgren, 2000; Vaida 
and Xu, 2000, among others). Hougaard (2000) provides a comprehensive review of the 
properties of the various frailty distributions. In a frailty model, the parameters of interest 
typically are the regression coefficients, the cumulative baseline hazard function, and the 
dependence parameters in the random effect distribution. 

Since the frailties are latent covariates, the Expectation-Maximization (EM) algorithm 
is a natural estimation tool, with the latent covariates estimated in the E-step and the 
likelihood maximized in the M-step by substituting the estimated latent quantities. Gill 
(1985), Nielsen et al. (1992) and Klein (1992) discussed EM-based maximum likelihood 
estimation for the semiparametric gamma frailty model. One problem with the EM al- 
gorithm is that variance estimates of the estimated parameters are not readily available 
(Louis, 1982; Gill, 1989; Nielsen et al., 1992; Andersen et al., 1997). It was suggested 
(Gill, 1989; Nielsen et al, 1992) that a nonparametric information calculation could yield 
consistent variance estimators. Parner (1998), building on Murphy (1994, 1995), proved 
the consistency and asymptotic normality of the maximum likelihood estimator in the 
gamma frailty model. Parner also presented a consistent estimator of the limiting covari- 
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ance matrix of the estimator based on inverting a discrete observed information matrix. 
He noted that since the dimension of the observed information matrix is the dimension 
of the regression coefficient vector plus the number of observed survival times, inverting 
the matrix is practically infeasible for a large data set with many distinct failure times. 
Thus, he proposed another covariance estimator based on solving a discrete version of a 
second order Sturm-Liouville equation. This covariance estimator requires substantially 
less computational effort, but still is not so simple to implement. 

The purpose of our work here is to develop a new inference technique that can handle 
any parametric frailty distribution with finite moments. Our new method possesses a 
number of desirable properties: a non-iterative procedure for estimating the cumulative 
hazard function; consistency and asymptotic normality of parameter estimates; a direct 
consistent covariance estimator; and easy computation and implementation. The rest 
of the paper is organized as follows. In Section 2 we present the estimation procedure. 
Consistency and asymptotic results for the estimators are given in Section 3. As the 
frailty model is often applied using a gamma frailty distribution, Section 4 compares the 
finite sample performance of our approach and the EM-based approach under the gamma 
distribution. Section 5 provides an example using a diabetic retinopathy data set. Section 
6 presents concluding remarks. 

2 The Proposed Approach 

Consider n families, with family i containing m 8 members, i — 1, . . . , n. Let = I(T-j < 
Cij) be a failure indicator where T°- and Cy are the failure and censoring times, respec- 
tively, for individual ij. Also let T^- = min(T^, C^) be the observed follow-up time and 
Zjj be a p x 1 vector of covariates. In addition, we associate with family % an unobservable 
family-level covariate Wi, the "frailty", which induces dependence among family mem- 
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bers. The conditional hazard function for individual ij conditional on the family frailty 
Wi, is assumed to take the form 

Xij(t) = Wi\ (t) exp(/3 T Zjj) % = 1, . . . , n j = 1, . . . , to, 

where Ao is an unspecified conditional baseline hazard and (3 is a p x 1 vector of unknown 
regression coefficients. This is an extension of the Cox (1972) proportional hazards model, 
with the hazard function for an individual in family i multiplied by Wi. We assume that, 
given and Wi, the censoring is independent and noninformative for Wi and (/3, A ) (in 
the sense of Andersen et al., 1993, Sec. III. 2. 3). We assume further that the frailty Wi 
is independent of Z^- and has a density f(w; 9), where 9 is an unknown parameter. For 
simplicity we assume that 9 is a scalar, but the development extends readily to the case 
where 9 is a vector. Let r be the end of the observation period. The full likelihood of the 
data then can be written as 

L= Wl =l j UfM^m 3 )}^S tJ (T tJ )f(w)dw 
= n^n^AoC^) exp(/3 T Z^)} 5 -n™ =1 J w N ^ exp{-wH l .(r)}f(w)dw, (1) 

where N^t) = kA T H < *), ^-(0 = E^i^W, H tJ (t) = A (T ij A t) exp(/3 T Z^) , 
aAb = min{a, 6}, A (-) is the baseline cumulative hazard function, >%(•) is the conditional 
survival function of subject ij, and H, L (t) = YJ£L\Hij{t). The log-likelihood is given by 

i = EE MV^) exp(/3 T Z^)} + £ log / u^-M exp{-^.(^)}/M^ • 

i=lj=l i=l ^ J 

The normalized scores (log-likelihood derivatives) for (f3\, . . . ,f3 p ) are given by 
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n mi 



1 » [Egi^g^jj J^ (T)+1 exp{- W ff^)}/M^ 



x ^ — ^ ^ — ^ r -L r — > 4 j = ± — b J \ *>J / — 6 J 1 J — j. ^ t * \ / J •/ \ — / 



i=i j=i 



for r = 1, . . . ,p. The normalized score for 9 is 



1 " J w N > ^ exp{-wH t .(T)}f(w)dw 
p+l ~nfr[ J w N >M exp{-wHiXT)}f(w)dw 
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where f'(w) = f f(w). Let 7 = (f3 T ,6) and U(7,A ) = (U u . . . , U p , U P+X ) T . To obtain 
estimators (3 and 9, we propose to substitute an estimator of A , denoted by A , into the 
equations U(7, A ) = 0. 

Let Yij(t) = I(Tij > t) and let T t denote the entire observed history up to time t, that 



is 



T t = ct{A^(«), Yy(u), Z i:j ,i = 1, . . . , n; j = 1, . . . , m*; < u < t}. 

Then, as discussed by Gill (1992) and Parner (1998), the stochastic intensity process for 
Nij(t) with respect to T t is given by 

A (t) exp(/3 T Z ij )F ij (t)^(7, Ao, t-), (3) 

where 

i> i (j,A Q ,t)=E(W i \F t ). 

Using a Bayes theorem argument and the joint density (0) with observation time restricted 
to [0, t), we obtain 

^(7, A, t) = 2l (7, A, t)/0 H (7, A, t), 

where 

fci ( 7 ,A o ,t) = Jw N ^ + ^exp{-wH L (t)}f(w)dw, k = l,...,4. 

Given the intensity model Q, in which exp(/3 T Z)-^j(7, A , t— ) may be regarded as a time 
dependent covariate effect, a natural estimator of A is a Breslow (1974) type estimator 
along the lines of Zucker (2005). For given values of (3 and 8 we estimate Ao as a step 
function with jumps at the observed failure times t&, k = 1, . . . , K, with 

AA (r fe ) = * - = (4) 

E?=i Vi(7, Ao, T*-!) Ef=i fyfa) exp(/3 T Z^) 

where t4 is the number of failures at time T&. Note that given the intensity model (J3j), the 

estimator of the fcth jump depends on Aq up to and including time r^-i- By this approach, 



we avoid complicating the iterative optimization process with a further iterative scheme, 
like that of Shih and Chatterjee (2002), for estimating the cumulative hazard. 

3 Large-Sample Study 

Let 7 = (/3 oT , 9°) T with (3°, 9° and A^(t) denoting the respective true values of f3, 6 and 

^ x * 

A (t), and let 7 = {(3 ,0) T . In Appendix A, the conditions assumed in establishing the 
asymptotic properties of 7 are listed and discussed. 

Using arguments similar to those of Zucker (2005, Appendix A. 3), the following can 
be shown (see, Appendix A): 

A. A (t, 7) converges almost surely to A (t,7) uniformly in t and 7. 

B. U(7, A (-, 7)) converges almost surely uniformly in t and 7 to a limit u(7, A (-, 7)). 

C. There exists a unique consistent root to U(7, A (-, 7)) = 0. 

To show that 7 is asymptotically normally distributed, we write 

0= U(7,A (., 7 )) 
= U( 7 °, A°) + [U( 7 °, A (-, 7 )) - U( 7 °, A°)] 
+ [U(7,Ao(-,7))-U(7 ,A (-,7°))]. 

In Appendix B we analyze each of the above three terms and prove that n^ 2 (-y - 7 ) 
is asymptotically mean-zero normally distributed, with a covariance matrix that can be 
consistently estimated by the sandwich estimator 

D^(7){V( 7 ) + G(7) + CmV- l (l) T . (5) 

The matrix D consists of the derivatives of the C/ r 's with respect to the parameters 7. V 
is the asymptotic covariance matrix of U(7°, Aq), G is the asymptotic covariance matrix 
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of [U(7°, A (-,7°)) — U(7°,Aq)], and C is the asymptotic covariance matrix between 
U(7°,Ag) and [U(7°, A (-, 7°)) - U(7°, Ag)]. The term G + C reflects the added variance 
resulting from the need to estimate the cumulative hazard function. All the above matrices 
are defined explicitly in the Appendix. 



4 Simulation Study for the Gamma Frailty Case 

Gill (1985), Klein et al. (1992), and Nielsen et al. (1992) dealt with the gamma frailty 
model by applying the EM algorithm to the Cox partial likelihood. This methods may be 
interpreted as a semi-parametric full maximum likelihood method. Murphy (1994, 1995) 
showed consistency and asymptotic normality for the model without covariates, where the 
unknown parameters are the integrated hazard function and the gamma frailty param- 
eters. Parner (1998) extended the consistency and asymptotic normality results to the 
correlated gamma frailty model with covariates. In what follows we compare our proposed 
method to the EM method under the gamma frailty distribution with expectation 1 and 
variance 9. 

The following is the EM-based estimation algorithm as given in Nielsen et al. (1992). 

Step I: Using standard Cox regression software, obtain initial estimates of (3 and A , 
taking Wi — 1, % — 1, . . . , n (i.e. 9 = 0). 

Step II (E step): Using the current values of f3, A and 9, estimate the frailty value Wi 
by 

Step III (M step): Update the estimate of (3 by fitting a Cox proportional hazard 
model with covariates Z and offset term log(IU). Update the estimate of A by 
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the traditional Breslow type estimator associated with the Cox model. Update the 
estimate of 9 by the maximum likelihood estimator based on (JIJ). 

Step IV: Iterate between Steps II and III until convergence. 

Our estimation technique can be summarized by the following algorithm. 

Step I: Using standard Cox regression software, obtain initial estimates of f3 and as 
initial value for 9, let 9 = 0. 

Step II: Using the current values of (3 and 9, estimate A using the non-iterative estimate 
presented by Equation (jlj). 

Step III: Using the current estimate A , estimate (3 and 9 by solving U(7, A) = 0. 

Step IV: Iterate between Steps II and III until convergence. 

It is easy to see that under the gamma distribution for Wj, 

WT ,A , t _) = E W F t -) = ||^- m 

Murphy (1994) showed that for the model without covariates, an estimator of the cumu- 
lative hazard function based on the EM algorithm with (JJJ) instead of © converges to 
the true value of the cumulative hazard function. This result can be extended to the case 
where covariates are included in the model. 

Note that in Murphy (1994), the cumulative hazard function at includes the cu- 
mulative information up through time 7/-, whereas in the EM algorithm the accumulated 
information is up through time r, the entire study period. In contrast, in our approach 
the cumulative hazard function at only includes the information up through the previ- 
ous failure time point Tk-i- Hence, one might suspect our estimators are somewhat less 
efficient than the EM-based estimators. Part of the goal of our simulation study was to 
assess the extent of this potential efficiency loss. 
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The setup for the simulation study is similar to that of Hsu et al. (2004) for investi- 
gating a semiparametric estimation of marginal hazard function from case-control family 
study with the required modifications for the current prospective setting. For each family 
we generated a common frailty value W from the gamma distribution with scale and shape 
parameters both equal 6~ l . We consider 300 families, each of size 2. A single covariate 
from the standard normal distribution was incorporated. Conditional on W, the survivor 
function is 

S(t\Z, W) = exp{-W exp((3Z)(0.01t) 4S } 

Thus, with (5 = ln{2) or ln(3) and a normal distribution for the censoring, with mean 60 
years and standard deviation of 15 years, the censoring level is approximately 85% and 
80%, respectively. The censoring distribution was chosen in order to generate appropriate 
mean age at onset and distribution, similar to what is often observed for late onset diseases. 
With censoring distributed according to 7V(130, 15 2 ) the respective censoring levels are 
approximately 35% and 30%. Table 1 summarizes the results for the two estimation 
techniques, for (5° = ln{2) or Zn(3) and 9° = 2. For our method, we compare the mean 
estimated standard error based on our theoretical formula with the empirical standard 
error, and provide the empirical coverage rate of 95% Wald-type confidence interval. For 
the EM-based method, we report only the empirical standard error. In addition, the 
empirical correlation between the EM-based estimators and our estimators is presented. 
It is evident that both estimation techniques perform very well in term of bias. Also, 
for our method, good agreement was observed between the estimated and the empirical 
standard error. The high values of the correlations implies similarity between the two 
estimation techniques not only on an average basis, but actually on a data set by data 
set basis. 
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5 Example 

We now apply our method under the gamma frailty distribution to a diabetic retinopa- 
thy data set. The Diabetic Retinopathy Study (DRS) was begun in 1971 to study the 
effectiveness of laser photocoagulation in delaying the onset of blindness in patients with 
diabetic retinopathy. Patients with diabetic retinopathy and visual acuity of 20/100 or 
better in both eyes were eligible for the study. For each study subject, one eye was 
randomly selected for treatment laser photocoagulation and the other eye was observed 
without treatment. The outcome variable is time to blindness of each eye. For illustrative 
purposes the following analysis involves 197 high-risk patients as defined by DRS criteria. 
Of the 394 measurements, 239 (61%) are censored. The regression coefficient estimate of 
the treatment effect was —0.890 and —0.910 according to our proposed estimator and the 
EM algorithm, respectively. The respective estimated standard errors, 0.175 and 0.174, 
are based on 50 bootstrap samples. The estimate of 9 was 0.865 with our approach, and 
0.857 with the EM approach. The respective estimated standard errors are 0.367 and 
0.365. As one can see, both method yield extremely similar results. Both indicated that 
the treatment appeared effective in delaying the time to blindness, and that the times to 
blindness for both eyes are highly dependent. The hazard rate of one eye becoming blind 
given the other eye is blind is almost twice {1+9) as high as that given the other eye is 
not blind. 

6 Discussion 

We have presented a method for estimating the regression coefficient vector and frailty pa- 
rameter in a prospective frailty survival model. The procedure is applicable to any frailty 
distribution with finite moments. We have shown that our estimators of the regression 
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coefficients and frailty parameter are consistent and asymptotically normally distributed, 
and given an explicit consistent estimator for the variances of the parameter estimates. 
For the popular gamma frailty model, we have presented simulation results showing that 
our estimator is essentially as efficient as the estimator based on the EM algorithm. For 
our procedure, a consistent covariance estimator is available which is much easier to com- 
pute than its counterpart for the EM method as given by Parner (1998). Nonconjugate 
frailty distributions can be handled by a simple univariate numerical integration over the 
frailty distribution. 

The estimation approach used here for estimating the cumulative hazard function can 
be applied in some other important settings, such as the case-control family study. Our 
approach avoids an iterative procedure for the A , enabling the asymptotic properties of 
the estimator to be derived in a relatively straightforward fashion. Shih and Chatterjee 
(2002) proposed a semi-parametric quasi-partial-likelihood approach for estimating the 
regression coefficients in survival data from a case-control family study. Their cumulative 
hazard estimator requires an iterative solution, and thus the properties of their estimates 
could only be investigated so far by a simulation study. If their method is modified by 
using our approach to estimating A (tt), the proof presented in Appendix B can serve 
as a basis for the asymptotic properties of the resulting procedure, with appropriate 
modifications. The extension to this case will be presented in a separate paper. 

7 Appendix: Asymptotic Theory 
7.1 Assumptions and Background 

In deriving the asymptotic properties of 7 we make the following assumptions: 

1. The random vectors (7$, . . . , T^, C a , . . . , C irrii , Z a , . . . , Z imu W t ), i = 1, . . . , n, are 
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independent and identically distributed. 

2. There is a finite maximum follow-up time r > 0, with E[X^j=i Yij(r)] = y* > for 
all i. 

3. (a) Conditional on and Wi, the censoring is independent and noninformative 

of Wi and (/3, A ). 

(b) Wi is independent of Z^- and of mj. 

4. The frailty random variable PVj has finite moments up to order (m + 2), where m is 
a fixed upper bound on m^. 

5. Z,j is bounded. 

6. The parameter 7 lies in a compact subset Q of IR P+1 containing an open neighbor- 
hood of 7 . 

7. There exist b > and C > such that 

limurM/H = C. 

tu— +0 

8. The baseline hazard function Ag(i) is bounded over [0,r] by some constant A ma:r . 

9. The function f'(w;9) = (d/d9)f(w;9) is absolutely integrable. 

10. The censoring distribution has at most finitely many jumps on [0, r]. 

11. The matrix [(£?/97)U( / y, Ao(-, 7))]| T=T ° is invertible with probability going to 1 as 
n — > 00. 

The matrix (d / 8^)1] (-y , A (-, 7)) is presented explicitly in Section 7.3, Step IV. From 
(p£i)) -(pT7 J) . it is seen that a general proof of invertibility is intractable, but given the data, 
one can easily check that numerically the matrix is invertible. 
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7.2 Technical Preliminaries 

Here we present some technical results that are needed for the asymptotic theory. First 
note that, by the boundedness of (3 and Z ij: there exists a constant v > such that 

v- 1 < exp{f3 T Zij) < v. (8) 

Next, recall that 

_ jw N ^ +1 e~ H ^ w f{w)dw 
~ Jw N ^)e- H ' ( t ) w f{w)dw ' 

with ^.(t) = #,.(t, 7 , A) = E7=i A(T 4j A t) exp(/3 T Z lj ) (here we define H^. so as to allow 
dependence on a general 7 and A, which will often not be explicitly indicated in the 
notation). Define (for < r < m and h > 0) 

_ /w r+1 e-^/H^ 
V lr ' j Jw-e- hw f(w)dw ' 

Also define ^ in (h) = min < r < m ip*(r, M an d ipmax(h) = maxo< r < m V*( r > ft)- Note that, in 
the expression for ip*(r, ft), the numerator and denominator are bounded from above by 
the assumption that W has finite (m + 2)-th moment. In addition, the numerator and 
denominator are by necessity strictly positive, for otherwise W would have a degenerate 
distribution concentrated at 0. Thus ip^axW * s finite and ^^(ft) is strictly positive. 

Lemma 1: The function ip*(r,h) is decreasing in ft. In consequence, we have, for all 
7 G Q and all t, 

lfc(7,A,f) < Ca,(0), (9) 
^( 7 ,A,t) > r mn (muA(t)). (10) 

In addition, there exist B > and ft, > such that, for all ft > ft, 

r mi n(h) > Bh-\ (11) 
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Proof: We have 



J w r+2 e- hw f(w)dw ( Jw r+1 e- hw f(w)dw" 



(12) 



J W r e -hwf( w j dw y j W r e -hwf( w j dw J 

This quantity is negative for all h, which establishes that ip*(r, h) is a decreasing function 
of h. Now, by definition, ^(7, A,t) = ip*(Ni(t), H t .(t)). We have < H^t) < muA(t). 
The inequalities Q and (fTU|) follow immediately. 

As for (|11|). using a change of variable and Assumption 7, we find that 

lim hijj*(r, h) = — -7— — = r + b. 

Choosing h large enough so that the above limit is obtained up to a factor of, say, 1.01, 
the result follows. 



We define 



A = 1.03e r 



mv 



with a = l.Olmu 2 /(By*), where h and B are as in Lemma 1. 



Lemma 2: With probability one, there exists n' such that, for all t G [0, r] and 7 G 



A (t,7) < A for n > n'. 



(13) 



Remark The point of this lemma is that A (t, 7) is automatically bounded above, without 
any need to impose an upper bound artificially. 

Proof: To simplify the writing below, we will suppress the argument 7 in A Q (t, 7). Recall 

1 



AAo(T fc ) 



Er=i Ml, A o, n-i) Ef=i Y tJ (n) exp(/3 T Zy) ' 



where we now take dk = 1 since the survival time distribution is assumed continuous. 
Using Lemma 1 and (jHJ), we have 

n rrii 



AA (r fc ) < n 1 v^* min (mv k(r k ^)) 1 



n 



i=i j=i 



T 
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Now, since Y^=\ Yij( T ) are iid random variables with expectation y*, by the strong law of 
large numbers we have 

i n rrii 
a i=l j=l 

almost surely. Hence, with probability one, there exists n* such that 

-i n n%i 

Y iA T ) > 0.999y* for n > n*. (14) 

n i=l j=l 

We thus have, for n > n*, 

AA (r fc ) < n" 1 (H^J ^(TTU/ACrfc-i))- 1 . (15) 

Now, if A (t) < h/{mv) for all t then we are done. Otherwise, there exists k' such that 
Aq(t^) < hjimv) for k < k' and Ao(Tfc) > hjimv) for > A;'. Using the last inequality of 
Lemma 1, we obtain, for k > k', 

AA (r fe ) < 71-VAo(t^_i), 

or, in other words, 

A (r fc ) <(l + ?\ Ao(-nfc_i). 
Iterating the above inequality we get 

f <j\ l * ( a \mn A 

A (r fe , + ,) < f 1 + "J A o(^) < (l + -J A (r fe ,) < 1.01e m<J A (r,v) 
for n large enough. But, using (JT5j) and the fact that Aq(t/-/_i) < h/(mu), we have 

which is less than 1.01hj(mv) for n large enough. The desired conclusion follows. 
Lemma 3: sup sg [ 0jT ] |A (s,7°) — A (s— ,7°)| — > as n — > oo. 
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Proof: Since A (s,7) — A (s— ,7) equals AA (Tfc) for s = r k and zero otherwise, it is 
enough to show that sup fc AA (rfc, 7 ) — > as n — > 00. But from Lemma 2 and (|T3|) we 
have 

AA (r fc ) < n- 1 fH^) CnM)- 1 
for n sufficiently large. The conclusion follows immediately. 

7.3 Consistency 

We now show the almost sure consistency of (3 and A . The argument is built on 
Claims A-C of Section 3, which we prove below. Our argument follows Zucker (2005, 
Appendix A. 3). 

Claim A: Ao(i, 7) converges a.s. to some function Ao(t, 7) uniformly in t and 7. 
Proof: In the proof below, whenever a functional norm is written, the relevant uniform 
norm is intended. 

Define A max = max(A, \ max r) and if)**{r, h) = i/;*(r, h A h max ), where h max = mvk max . 
It is easy to see from (fT2"|) that tp**(r, h) is Lipschitz continuous in h (uniformly in r). Recall 
that ip i ('y,A,t) = ip*(Ni(t), Hi.(t, 7, A)). But Lemma 2 implies that H^t, 7, A (-, 7)) < 
h max for all t E [0,r] and 7 e Q. Hence we see that tp^j, A (-, 7), t) = ip**{Ni(t), H^t, 7, A (-, 7))). 

Now define, for a general function A, 



S„(t, 7 ,A) = / 
Jo 

and 



n- 1 E?=i E, m =i ^(Niis-IH^- 7, A))^(s) exp(/3 J Z, 



^ an = /■« gggi #,(*- 7°, Ag))^-(a) exp(/3° r Z -)] 

^' 7 ' J io E[E7= 1 ^(iV,(3-),^.( S -,7,A))^( S -)exp(/3 T Z ij )] ' 

By definition, A (i, 7) satisfies the equation 

Ao(t,7) = £ n (t,7,Ao(-,7))- ( 16 ) 
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Next, define 

, A) = gggi rm-s-), Hi.(8- 7°, Kmjjs) exp(/3° r Z tJ )] 
W ' J E[E^i^*(iVi(a-), J ffi.(a-,7,A))yy( S )ex P 09 r Zy)] oW ' 

This function is uniformly bounded by B* = [ipmaxi®) l^min{^max)]^max- Moreover, by the 
Lipschitz continuity of ip**(r, h) with respect to h, it satisfies the Lipschitz-like condition 
(for some constant K) 

\c[f(s, Ai) — g 7 (s, A 2 )| < K sup |Ai(«) - A 2 («)|. 

0<M<S 

Hence, by mimicking step by step the argument of Hartman (1973, Theorem 1.1), we find 
that the equation A(t) = S(£, 7, A) has a unique solution. We denote this solution by 
Ao(t, 7). The claim then is that Ao(t, 7) converges almost surely (uniformly in t and 7) 
to this function A (t, 7). Though it may be possible to prove this claim directly, we shall 
use a convenient indirect argument. 

Define Ao^(£,7) to be a modified version of Ao(t, 7) defined by linear interpolation 
between the jumps, where we have added the superscript n for emphasis. Lemma 3 implies 
that, with probability one, 

sup|A( n) (t,7)-A (t,7)| ^0, (17) 

t,t 

and thus 

sup \E n (t, 7, A (t, 7)) - S n (t, 7, A (t, 7)) I -> 0. (18) 

Lemma 2 shows that the family C = {Aq^(£, 7), n > n'} is uniformly bounded. We will 
establish in a moment that £ is also equicontinuous. It then follows, by the Arzela-Ascoli 
theorem, that the closure of C in C([0, r] x Q) is compact. 

The equicontinuity of C is shown as follows. Recall that N^t) = YJjii Nij(t). Write 
N(t) = n^E^iE^i^W- We have N(t) -> E[Ni(t)] as n -> 00 uniformly in t with 
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probability one, with 



mm 



E 



^^(iV l ( S -),^.( S -,7 ,A°))y iJ ( S )exp(/3 oT Z, 

3=1 



Aq(s)o?s. 



In view of this and ()14|) there exists a probability-one set of realizations Q* on which the 
following holds: for any given e > 0, we can find n"(e) such that sup t \N(t) — -E[A^(t)]| < 
e/(4B°) for all n > n"(e), where B° = 1.01u/[i(:^ nin (h rnax )y*]. In consequence, for all t and 
u with u < t, we find that 



A (t,7) " Ao(m,7) 
satisfies 



« n- 1 Eti ££i il>**(Ni(s-), Hi.(s-, 7, A))^(s) exp(/3 J Z 



A (t,7) - A Q (w,7) < £*(£- «) + - for all n > n"(e) 



(19) 



Moreover, it is easy to see that A (t, 7) is Lipschitz continuous in 7 with Lipschitz constant 
C*, say, that is independent of t. 

These two results imply that £ is equicontinuous. This is seen as follows. For given e, 
we need to find 5* and 5% such that |Aq (t, 7) — Aq (u, 7) | < e whenever \t — u\ < 51 and 
I Aq (t, 7) — Aq (t, 7') I < e whenever H7 — 7'|| < 5%- The latter is easily obtained using 
the Lipschitz continuity of Ao(t, 7) with respect to 7. As for the former, for n > n"(e) this 
can be accomplished using ()19jl. while for n in the finite set n' < n < n"(e) this can be 
accomplished using the fact that the function Aq^(£,7) is uniformly continuous on [0, r] 
for every given n. 

We have thus shown that C is (almost surely) a relatively compact set in the space 
C([0,t]xQ). 
Next, define 

1 n mi 

A(7,A,s) = -^^^*(iV t ( S -),^.( S -,7,A))^( S )exp(/3 T Z l:; ), 
n »=i i=i 

0(7, A, s) = E ^^*(iy,( S -),^.( S -,7,A))^( S )exp(/3 T Z. y 
3=1 
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We show below that, with probability one, 

sup|A( 7 ,A (n) ,s) -a( 7 ,A (n) ,s)| -4 0. (20) 
Given this and the a.s. uniform convergence of N(t) to E[A^(t)], we can infer that 

8up|S B (t,7,Aj n) (t,7))-S(t,7,Aj n) (t, 7 ))| - 0. (21) 
ta 

The result (|21|) is easily obtained by adapting the argument of Aalen (1976, Lemma 6.1), 
making use of the equicontinuity of C. It is here that we make use of Assumption 10, for 
the adaptation of Aalen's argument requires 0(7, A, s) to be piecewise continuous with 
finite left and right limits at each point of discontinuity. 

From (US!), (H7D, (HEJ), and (jZEJ) it follows that any limit point of {A[, ra) (t,7)} must 
satisfy the equation A = E(t, 7, A). Since A (i, 7) is the unique solution of this equation, 
it is the unique limit point of {Aq (t, 7)}. Thus {Aq (i, 7)} is a sequence in a compact 
set with unique limit point A (t, 7). Hence Aq (t, 7) converges a.s. uniformly in t and 7 
to A (t, 7). In view of (fTTj). the same holds of A (t, 7), which is the desired result. 

To complete the proof, we must establish (|20j) . This involves several steps. First, it is 
easy to see that there exists a constant k (independent of 7 and s) such that 

sup|A(7,A 1 , S )-A( 7 ,A 2 , S )| < «||Ai-A 2 ||, (22) 
sup |a(7, Ai, s) - 0(7, A 2 , s)| < /c||Ai — A 2 ||. (23) 

Next, for any fixed continuous A, the functional strong law of large numbers of Andersen 
& Gill (1982, Appendix III) implies that, with probability one, 

sup|A( 7 ,A, S )-a(7,A, S )|^0. (24) 

Now, given e > 0, define the sets {tj }, {'J^}, and {A; } to be finite partition grids of 
[0,r], Q, and [0,A max ], respectively, with distance of no more than e between grid points. 
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Define C* to be the set of functions of t and 7 defined by linear interpolation through 
vertices of the form (tf\ 7^. , A^). 

Obviously C* is a finite set. Hence, in view of (I24J) . there exists a probability-one set 
of realizations fl e for which 

sup \A(j,A, s) - 0(7, A, s)| — > 0. (25) 
se[o,T],-yse,Ae£* 

Define 

= n aw 

e=i 

and ^0 = ^* H fi**, with Q* as defined earlier. Clearly Pr(fio) = 1- From now on, we 
restrict attention to f2 . 

Now let e > be given. Choose i > e~ l . In view of (|T9|) and (}25|) . we can find for any 
uj E Qo a suitable positive integer n(e,uj) such that, whenever n > n(e,u), 

\A^(t n )-A^(un)\<B*(t-u) + ^ Vf,u, (26) 

sup |A(7,A,s)-a( 7 ,A,s)| < e. (27) 
se[o,r],-yeff,Ae£* /f 

Next, let Aq" - -* denote the function defined by linear interpolation through {t^\ A^j)), 
where A^ is the element of {A^} that is closest to Aq (4 , 7^ ). It is clear that 

Using (J26|) and the Lipschitz continuity of Aq^(£,7) with respect to 7 (which follows from 
the corresponding property of A (t, 7)), we thus obtain 

sup|A^(t,7)-AS n) (t,7)l<^ 
t,i 

for a suitable fixed constant B** (depending on B* and C*). Combining this with (f2*Tj) 
and (J22J), we obtain 

sup|A(7,A (n) ,s) -a( 7 ,A (n) ,s)| < (2kB** + l)e for all n > n(e, w). 
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Since e was arbitrary, the desired conclusion ()2(J|1 follows, and the proof is thus complete. 



Remark Note that Ao(-, 7°) = Ag(-) since Aq trivially solves the equation A = S(t, 7 , A). 

Claim B: With probability one, U(7, A (-, 7)) converges to 11(7, A (-, 7)) = E[U(7, A (-, 7))] 
uniformly in 7 over Q . 

Proof: Since U(7, A (-, 7)) is the mean of iid terms, the functional strong law of numbers 
of Andersen & Gill (1982, Appendix III) implies that U(7, A (-, 7)) converges uniformly 
in 7 almost surely to 11(7, Ao(-, 7)). It remains only to show that 

sup |U( 7 , A (., 7)) - U( 7 , A (-, 7 ))h0 (28) 

almost surely. Now it may be seen easily from the structure of U(7, A) that there exists 
some constant C° (independent of 7) such that 

|U(7,A 1 )-U( 7 ,A 2 )| <C°||A 1 -A 2 ||. 

Given this result along with the result of Claim A, the result (|2*Hj) follows immediately. 

Claim C: There exists a unique consistent root to U(7, A (-, 7)) = 0. 

Proof: We apply Foutz's (1977) theorem on consistency of maximum likelihood type 

estimators. The following conditions must be verified: 

Fl. 9U(7, Ao(-,7))/c?7 exists and is continuous in an open neighborhood about 7 . 

F2. The convergence of 9U(7, Ao(-,7))/c?7 to its limit is uniform in open neighborhood 
of 7 . 

F3. U(7°, A (-,7°)) -> as n ->• 00. 
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F4. The matrix — [<9U(7, A (-, 7))/<97]| 7= y> is invertible with probability going to 1 as 
n — > oo. (In Foutz's paper, the matrix in question is symmetric, and so he stated 
the condition in terms of positive defmiteness. But it is clear from his proof, which 
is based on the inverse function theorem, that the basic condition needed is invert- 
ibility.) 

It is easily seen that Condition Fl holds. Given Assumptions 2, 4, and 5, Condition F2 
follows from the previously-cited functional law of large numbers. As for Condition F3, 
in Claim B we showed that U(7, A (-, 7)) converges a.s. uniformly to 11(7, A (-, 7)) = 
E[U(7, A (-, 7))]. We noted already that A (-,7°) = A (-). Thus all we need is to show 
that E[U(7°, A )] = 0. Since U is a score function derived from a classical iid likelihood, 
this result follows from classical likelihood theory. Condition F4 has been assumed in 
Assumption 11; we noted previously that, given the data, it can be checked numerically. 
With Conditions F1-F4 thus verified, it follows from Foutz's theorem that 7 — > 7 as 
n — > 00 with probability one. 

7.4 Asymptotic Normality 

To show that 7 is asymptotically normally distributed, we write 

0= U(7,A (., 7 )) 

= U( 7 °, A°) + [U( 7 °, A (-, 7 )) - U( 7 °, A°)] 

+ [U(7,Ao(-,7))-U(7 ,Ao(-,7°))] 

In the following we consider each of the above terms of the right-hand side of the equation. 
Step I 

We can write U(7°, Ag) as 

U(7 ,Ao) = ^E&, 

n i=i 
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where £j is a (p + l)-vector with r-th element, r — 1, . . . ,p, given by 

_^ [ggi H t] {r)Z l3r ] Jw N ^ +1 expi-wjHMmw; 6)dw 

Zir-^kjZijr J w N ^) ex P {-wH l .(r)}f(w;9)dw 

and (p + l)-th element given by 

j w N ^)exp{-wHj {r)}f'{w;9)dw 
^ (p+1) ~ Jw N iM exv{-wHiXT)}f(w; 6)dw ' 

Thus U(7°,Aq) is the mean of the iid mean- zero random vectors £j. It hence follows 
immediately from the classical central limit theorem that n5TJ(7°, Aq) is asymptotically 
mean-zero multivariate normal. To estimate the covariance matrix, let be the counter- 
part of with estimates of 7 and A substituted for the true values. Then an empirical 
estimator of the covariance matrix is given by 

1 n 

V(7) = -£af- 

n i=i 

This is a consistent estimator of the covariance matrix since A (t, 7) converges to A (t, 7) 
a.s. uniformly in t and 7 (Claim A), and 7 is a consistent estimator of 7 (Claim C). 
Step II 

Let U r = U r (-y°, A ), r = 1, . . . ,p, and U p +i = 6^+1(7°, A ) (in this segment of the 
proof, when we write (7°,A ) the intent is to signify (7 , A (-, 7 )). First order Taylor 
expansion of U r about Aq, r — 1, . . . ,p + 1, gives 

n 1 / 2 {C/ r (7°,A )-C/ r (7°,A°)} 

= n~ 1/2 £ E Qvrh°, A°, Ty){Ao(Ty, 7°) - AS(Ty)} + o p (l), (29) 
i=l J=l 



where 



r ) ,• \ f . ) <p2ih°, Ag,r) _ 03»(7° ; Ag,r) p* tt (t \ 7 

l " r[ ' ' " ~ ^ ^(7°, K,r) ij ijr Mr,K,r) ij f^ iA ij) ijr 
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for r = 1, . . . ,p, and 

n R- f fe(-y°.Ag,T)^»( 7 °,Ag,T) 4?(7°.AS,t) 
* wll,A,r "'- , "l ffiVM fe(7°,A5,r) 

with = exp(/3 T Zj,) and 

4?(7,A ,t) = / ^W+^-^expl-^^.^)}/'^)^, fc = l,2. 

The validity of the approximation f)29j) can be seen by an argument similar to that used 
in connection with (J31|) below. 

Based on the intensity process Q, the process 

j 

is a mean zero martingale with respect to the filtration !F t . Also, by Lemma 3, we have 
that sup sg [ 0)T ] |A (s,7°) — A (s— ,7°)| converges to zero. Thus, replacing s— by s we 
obtain the following approximation, uniformly over t G [0, r]: 

1 ft n mi 

" J0 i=l j=l 

1 f t r A n mi 

+ -/ {y^Ao)}- 1 -!^,^)}- 1 ]EEdNij(s), (30) 



i=l j=l 



where 



Now let 



-in mi 

y(s,A) = -Y / ^h°,^s)Y / y^s)exp((3 oT Z lJ ). 

71 i=l 3=1 



W(s,r) = {y(s,A° Q + rA)}- 1 



with A = A — Aq. Define W and VV as the first and second derivative of W with respect to 
r, respectively. Then, by a first order Taylor expansion of W(s, r) around r = evaluated 
at r = 1 with Lagrange remainder (Abramowitz and Stegun, 1972, p. 880) we get (after 
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computing the necessary derivatives) 



{y(s, Ao)}- 1 - {y(s, A")}- 1 = W(s, 0) + -W(s, r(s)) 



-r n rrii 
n i=l j=l 



Ri.(s)vii(0,s) 1 



exp(/3 7 Z y ){A (Iij As) - A°(T^ A s)}, (31) 



where -R^m) = exp(/3 T Zj,,)Yy(w), Ri.{u) = Y™=iRij(u), r(s) e [0, 1], 



r)u{r, s) 



( 7 °,A° + rA, S ) j> 2i ( 7 °,Ag + 7-A,s) 



li ( 7 °,Ag + rA,s) |> H (7 ,A§ + rA,s)J 
and hi(r,s) is as defined §7.5 below. We show there that hi(r,s) is o(l) uniformly in r 
and s. 

Let ^ii(s) = r?ii(0, s). Plugging (|3~T|) into (JHOJ) we get 



„t n raj 

Ao^^-ASW^^ 1 / {y( s i ^-o)} 1 E E dMij(s 



— n 



-n 



+ n 



n rrii 



i=l j=l 

t n ™^ I(T kl > S )R k .{s) 7fik(s) r . , \- \- 7V / > 

E E A ou2 exp (^ Z fci ){A (s) - A (s)} ^ ^ diVy(s) 

fc=u=i iJ / l s 5 A o;j i=lj=l 

4 E E /(T %- V f^ lfc(g) exp(^Z M ){A (TH) - AS(T H )} £ £ ^( S ) 

fc=U=l lJ / l s > A oJ/ i=lj=l 

/ E E *) exp(/3 T Z fc/ ){A (T M ) - A°(T fci )} E E 

J0 fe=i i=i z j=i j=i 



The third term of the above equation can be written, by interchanging the order of 
integration, as 



n 



k=l 1=1 i=l 3-- 



{Jo {y(s,A° )y 



■exp(/3 J Z 



ki 



n rrii 



{AqH - A°(u)}dN kl (u)} 



dNij(s) 



{A (s) - A°(s)} £ E t)dNM 

i=l j=l 



where Nij(t) = I{Tij < t) and 

n ij (s,t)=n- 2 / {y(«,AS)}- 2 i2 i .(«)»7ii(«)exp(/3 T Z ii )X;E dJV i 

J S ,17 1 



k=l 1=1 
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Hence we get 

Ao(t,7°)-A^) « n- 1 {^AoirTE^iW 

J0 i=l7 = l 



3 

n m. 



- f {A ( S ,f) - A°( S )}^^{^T( S ) +o(n- 1 )}^ J ( S ) 
Jo »=i i=i 

where 

n m k 

T(s) =n- 2 {y(s,A° )}- 2 Y,J2 I (Tki > s)R k .(s)vik(s)eM^ki). 

k=l 1=1 

The o{n~ l ) is uniform in t (see §7.5) and will be dominated by Q and T, which are of 
order n~ x . Hence the o(n~ r ) term can be ignored. 

Given the all the above, an argument similar to that of Yang & Prentice (1999) and 
Zucker (2005) yields following martingale representation 

where 



m = n 



s<t 



1 + + }dN i3 (s) 

i=l j=l 



Based on (J2TJJ) . we can write 

n rrii T 

U r (Y, Ao) - ^(7°, Aq) « n 1 £ E / %r(7°, A°, S ){A (s, 7 °) - A°(s)}dA\(s). 

i=l j=l J ° 

Plugging the martingale representation (|32j) into the above equation and interchanging 
the order of integration gives 

[/ r (7°,A )-[/ r (7 o ,A°) 

~ r &rrh°,Kt) f f PQ~) Efc=i Egj dM k i(s) ~ 

^Y^f, ^^^^ , (33) 
Jo y(s,A ) 



where 



7r r (s,7,A ) = n y it- 



p(*) 
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Therefore, n 1/2 [U(7°, A (-, 7°)) - U(7°, A° (-, 7 ))] is asymptotically mean zero multivari- 
ate normal with covariance matrix that can be consistently estimated by 

-if < " A ^ ( A U^l^l^W 



G ri (7)=n / 7r r (s,7,Ao)7ri(s,7,A ){^(s-)} 

Jo {^(s,A )} 2 

for r, Z = 1, . . . ,p + 1. 

Step HI 

We now examine the sum of U(7°, A°) and U(7°, A (-, 7 )) - U(7°, A°). From (1331) . 
we have 

[7,(7°, AoO, 7°)) - £/r(7°, A o) « n" 1 / «r(s) E E rfM ^) = - E 0*n 

Jo fc=i Z=l n fc=l 

where a r (s) is the limiting value of 7r r (s, 7 , K° Q )p{s— )/y(s, Aq) and /Zfc r is defined as 

Hkr= a r (s) VrfM H (s). 
Jo z=i 

Arguments in Yang and Prentice (1999, Appendix A) can be used to show that p(s—) has 
a limit. Also, clearly E[/^ r ] = 0. 
We thus have 

1 n 

fZ r ( 7 °,A°) + [f/ r (7°,A (-,7°)) -U r (Y,A° )] «-£(&• + J*ir)> 

71 8=1 

which is a mean of n iid random variables. Hence n 1 ' 2 {U r ('y , Aq) + [{7 r (7°, A (-, 7 )) — 
t/ r (7°,Ao)]} is asymptotically normally distributed. The covariance matrix may be esti- 
mated by V(7) + 6(7) + 6(7), where 

1 - 

Crl{l) = ~ £(&>/4 + £*j/4)> r, / = 1, . . . , p + 1, 



n i=l 



with 



h y(s,A ) j=1 

28 



s 



and 



Mij(t) = Nij(t) - /*exp0 T Z ij )y ij (w)^(7,Ao,w-)rfA o H- 

J 



Step IV 



First order Taylor expansion of U(7, A (-, 7)) about 7 = (f3 oT ,9°) T gives 



U( 7 , A (-, 7)) = U( 7 °, Ao(-, 7°)) + D( 7 °)(7 - 7°) T + o p (l), 



where 



D ls {i)=dU l {iM;*i))l&i8 



for Z, s — 1, . . . ,p + 1, with 7 P+1 = 9. 
For /, s = 1, . . . , p we have 



A s (7) = 



-n 



i=i \<K(7,A ,t) j=1 13 dp s 
3 i(7,A o ,r) 0^(7, A ,r) 



u (7,A o ,r) 0?i(7,A o ,r) 



2^ Hij(Tij)Ziji 



0ft 



and 



<9AA (r fc ) 



<9/3 s 



exp(/3 T Z ij ) + A (T lj A r k ) exp(/3 T Z^)Z 



^ 0h(7, A ,r fc _i) 
0|j(7, A ,r fc _i) 3i (7, A ,r fe _i) 1 dH L {T k -i) 



+ 



1 Ll0ii(7, A ,r fe _i) H (7,A o ,T fe _i) J <9/5 s 
02i(7,A o ,r fc _i) ^ 

— 2^ n ij{ T k)^ijs 



Ri.{r k ) 



0ii(7, A ,Tfc_i) i= i 



For Z = 1, . . . , p we have 

n rVi „-iy f ^(7, A„,r) ^ dHij{Tij) 



7i 1 0h(7, A ,r) i=1 
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02?(7 ; A O ^) _ 02i(7 > A o ,r)^ ) (7, A ,r) 
H (7,A o ,r) ^(7, A ,t) 



>ii(7, A ,r) H (7,A o ,r) 



(35) 

j j=i 



and 

D (p+i)i(l) 
Finally, 



ff(7,A o ,r)0 2i (7,Ao,r) 
0^(7, A ,r) 



4?(7,Ao,r) I «9tf,(r) 



h (7,A o ,t) 0$ 



D(p+i)(p+i)(7) = " 



+ 



i=i 



^ ) (7,A o ,r)0 2l (7,A o ,r) 0^( 7 ,A o ,r) 



0ii(7, A , 



g(7,A ,r) 
0ii(7, A ) 



where 



« >tf) (7,Ao,r) 



u^^expl-u^.^)} 



0ii(7,A o ,r) 
d 2 f(w 



mxr) 

06 



dd 2 



dw, 



dH i:i (T k ) _ dA (Tij A r fc ) 



exp(/3 T Z^), 



and 



<9AA (T fc ) 
06 



Ri.(r k ) 



i=i 0ii(7> A ,r fe _i) j 

02? (7, A , r fc _i) 02i(7, A , r fc _i)0S?(7, A , r fc _i) 



i=l 



(7,A ,T fc _i; 



(7,A ,T fe _i) 



OHiXr k ~i) j(f4i(j, A ,r fc _i) 03i(7, A , T fe _i' 



^(7,A ,T fc _i) 



'ii(7> A ,r fe _^ 



Step V 



(36) 



(37) 



Combining the results above we get that n 1 / 2 (7 — 7 ) is asymptotically zero- mean 
normally distributed with a covariance matrix that can be consistently estimated by 



D-^TMVft) + G( 7 ) + C( 7 )}D- 1 (7) T 
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7.5 Definition and behavior of hi(r,s) 
The quantity hi(r, s) appearing in (j3Tj) is given by 

* M = {lb!** + rA)}3 1 1 Rds)Vu(r, s) g exp(/3%) A(r„- A 5) 
Ri.(s)v2i(r, s) ^ T 

where A(T y As) = A (Xy As) - Ag(T y A s) and 

f 2t ( 7 °, Ag + r A, g) 1 3 4t ( 7 °, Ag + r A, s) 2 i(7°, Ag + rA, s)0 3 j(7°, Ag + rA, g) 
^ SJ 1 0ii(7°, Ag + r A, s) J ^(7°, Ag + rA, s) {^(7°, Ag + rA, s)} 2 

For all i = l,...,n and s G [0, r], we have < Ri.(s) < mz A where z/ is as in (JBJ) 
Moreover, for = 1, . . . , 4, we have 



E[^ m - +{fc - 1) exp{-^me^ z Ag(r)}] < <fe( 7 , Ag, S ) < E[W[™* +M ] 



where r max = argmaxi< r < m E(W[), r min = argmini< r < m E(W[). Hence, ^ and 7/ 2i are 
bounded. In addition, the the proof of Lemma 2 show that y(s, A° + r A) is uniformly 
bounded away from zero for n sufficiently large. Finally, in the consistency proof we 
obtained ||A|| = o(l). Therefore hi(r,s) is o(l) uniformly in r and s. 
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Table 1: Simulation results for the gamma frailty model with single normal covariate; 
n = 300 and family size equals 2. Z ~ N{0, 1); (3° = log(2) or log(3); 0° = 2 







Our approach EM algorithm Our approach EM algorithm 









P = ln{2); 


35% censoring 






Empirical mean 


0.692 




0.689 


1.978 




1.969 


Empirical SD 


0.248 




0.253 


0.268 




0.308 


Mean estimated SD 


0.242 




- 


0.242 




- 


95% Wald-type CI 


95.6 




- 


96.3 




- 


Correlation 




0.952 


P = ln(2); 


85% censoring 


0.989 




Empirical mean 


0.699 




0.693 


1.942 




1.942 


Empirical SD 


0.479 




0.481 


0.897 




0.936 


Mean estimated SD 


0.442 






0.919 






95% Wald-type CI 


96.6 






95.0 






Correlation 




0.952 


P = Zn(3); 


30% censoring 


0.989 




Empirical mean 


1.102 




1.078 


1.985 




1.961 


Empirical SD 


0.255 




0.266 


0.265 




0.259 


Mean estimated SD 


0.231 






0.279 






95% Wald-type CI 


96.9 






96.1 






Correlation 




0.951 


P = Zn(3); 


80% censoring 


0.982 




Empirical mean 


1.099 




1.088 


1.921 




1.870 


Empirical SD 


0.465 




0.466 


0.800 




0.810 


Mean estimated SD 


0.443 






0.797 






95% Wald-type CI 


94.2 






96.3 






Correlation 




0.957 






0.993 
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